Tree-formed Verification Data for Trusted Platforms 



Andreas U. Schmidt and Andreas Leicher 

Novalyst IT AG 

Robert-Bosch-StraBe 38, 61184 Karben, Germany 

Email at: http://andreas.schmidt.novalyst.de/ 



Yogendra Shah and Inhyok Cha 

InterDigital Communications, LLC 

781 Third Avenue, King of Prussia, PA 19406 

Email: {yogendra.shah,inhyok.cha} ©InterDigital. com 



o 

< 



U 

O 



> 

q 

o 
o 

> 

X 



Abstract — The establishment of trust relationships to a trusted 
platform rehes on the process of validation. Validation allows an 
external entity to build trust in the expected behaviour of the 
platform based on provided evidence of the platform's configu- 
ration. In a validation mechanism such as remote attestation, the 
trusted platform exhibits verification data created during a start 
up process. These data consist in hardware-protected values of 
platform configuration registers, containing nested measurement 
values, i.e., hash values, of all loaded or started components. 
The values are created in linear order by the secured extend 
operation. Fine-grained diagnosis of components by the validator, 
based on the linear order of verification data and associated 
measurement logs, is inefficient. We propose a method to create a 
tree-formed verification data, in which component measurement 
values represent leaves and protected registers represent roots. It 
is shown how this is possible using a limited number of hardware- 
protected registers and the standard extend operation. In this 
way, the security of verification data is maintained, while the 
stored measurement log is consistently organised as a tree. We 
exhibit the basic mechanism of validating a platform using tree- 
formed measurement logs and verification data. 

1. Introduction 

In a nutshell, the process of building trust in computing 
platforms follows a unique, common pattern |1|. During start 
up of the platform, all components are measured by a protected 
entity on the platform before they are loaded and executed. 
The generation of a chain of trust is an important concept for 
a Trusted Computing System. This chain must extend without 
gaps from system boot up to the current system state, including 
all executed instructions and programs. Every component is 
required to measure and report the following component before 
executing it. Measurement of the direct successor prevents un- 
monitored execution of code between measurement and actual 
execution. The measurement process is protected by the root 
of trust for measurement, and can be implemented for instance 
by computing a digest value over code and configuration data. 
Verification data is compiled from the measurement values 
by a protected operation and stored in protected storage. The 
verification data identifies, after completion of secure start up, 
the platform's state uniquely. Important embodiments of these 
processes are authenticated and secure boot specified by the 
Trusted Computing Group (TCG). In l,2J, authenticated boot 
is specified for PC clients, whereas p] specifies secure boot 
for mobile platforms. The difference is essentially that secure 
boot adds a local verification and enforcement engine that 
lets components start only if their measurements are equal 
to trusted reference values. 



TCG proposes to compute verification data via the ex- 
tend operation of the Trusted Platform Module (TPM, Q), 
respectively, the Mobile Trusted Module (MTM, fSl), from 
measurement values, which are hashes of component code 
and/or data. The data are stored in Platform Configuration 
Registers (PCRs, a minimum of 16 according to version 1.1 
of the specification, at least 24 in version 1.2) in the TPM, 
where they can only be accessed by authorised commands. The 
extend operation builds a linearly ordered, nested chain of hash 
values, akin to the Merkle-Damgard transform, as follows: 



V^^V^om = H{V,\\7n), 



(1) 



where Vi denotes a verification data register {i ~ 0, . . . , 23 
for PCRs), i/ is a collision-resistant hash function (SHA-1 in 
case of the TPM), and m = _ff (data) is a measurement value. 
Thus, verification data of a TCG trusted platform is secured 
against manipulation by the TPM's protected functions and 
shielded capabilities. The verification data is accompanied by 
a more expressive record of measurement values and other, 
component-specific data in the stored measurement log (SML). 

In validation toward an external entity, verification and other 
data, such as the SML, is signed by the platform and trans- 
ferred to the validator. The validator is then able, in principle, 
to assess the trustworthiness of the platform to any desired 
granularity, limited only by the total information conveyed 
during validation. Again, paradigmatic embodiments for vaU- 
dation are defined by the TCG in the attestation protocols fSl . 
It is envisaged by TCG, that validation may eventually be used 
to take remedial steps on trusted platforms, for instance upon 
first network or service access, as envisioned by the Trusted 
Network Connect working group of the TCG fT). 

We propose a method to organise verification data and 
SML differently from the linear order foreseen by TCG 
specifications, in a tree, more precisely a Merkle hash tree IS), 
19). In Section Inj the efficiency problem with linearly chained 
verification data is highlighted from the viewpoint of applica- 
tions. The central security problem in organising verification 
data as a tree is to make their generation as secure as the 
measurement-extend operations of TCG specifications. We 
point out why this problem is not yet covered in the existing 



literature. Section III presents the core method and algorithm 
to generate verification data in a limited set of hardware 
protected registers, which truthfully represents the root nodes 
of a hash tree. Section |IV] shows how tree-like verification data 
and SML can efficiently and effectively be used for validation. 



Section IV] discusses implementation options for tree-formed 
verification data. Section VI concludes the paper with plans 
for further work. 

II. Problem Statement and Related Work 

The present section introduces the requirements for more 
expressive methods to communicate a platform state to an 
external validator We then state the basic security issue for 
the formation of verification data which represents structured 
sets of measurements on started/loaded components in authen- 
ticated or secure boot. Finally, we highlight the novelty of our 
approach relative to previous, related work. 

A. The Need for Structured Verification Data 

Verification data provide information about a system's state 
with unconditional security. In particular, they are secure in- 
dependently of the SML, which, according to TCG standards, 
has no particular protection on the platform or in a validation 
(it is not part of the signed attestation data). Only the signed 
PCR values, i.e., verification data itself, provides an implicit 
integrity control for the SML. For this, the verification data 
must be recalculated from the measurements in the SML, 
by retracing all extend operations. The TCG-standardised 
way to use PCR values in authenticated boot to secure the 
measurement log is based on the technique introduced by 
Schneier and Kelsey for securing audit logs on untrusted 



machines pO) , 1 1 1 1. In fact, it is a simplification, since only the 
last element of the hash chain is kept in a PCR, while the SML 
normally contains only the measurement values and not the 
intermediate entries of the hash chain. Integrity measurement 
using the TPM is implemented in the Integrity Measurement 
Architecture (IMA) \\2\ as a Linux kernel module to measure 
the integrity using the TPM and to generate a linear SML. 

Thus, the state-of-the art of verification data, created by 
linearly chaining extend operations, is only of limited value for 
remote diagnostics of a platform, and advanced management 
such as component-wise remediation. Essentially, the position 
of a manipulation of the SML, either by tampering with a 
measurement value before it is extended into a PCR, or by 
tampering with the SML itself after secure start up, cannot be 
localised with certainty. Furthermore, the space complexity of 
real world SMLs with many hundreds, or thousands, of mea- 
sured components, makes sifting it through for components 
which fail validation, i.e., for which measurement value differs 
from a "good" reference value, costly. In fact, for checking of 
code and data there are a variety of cryptographic checksum 
functions available, and all obviously require that the integrity 
of the checksums for the "correct" data be maintained. The 
requirement for a centralised database of all software in 
all valid versions on the various machines is a significant 
management problem, in need of an efficient solution. 

Future, large scale deployments of networked devices, such 
as required in machine-to-machine communication scenarios, 
require a solid device- and network-side, balanced and effi- 
cient trust infrastructure fl), | [T3| . Security requirements are 
particularly high for devices loosely connected to networks 



and operating semi-autonomously. Scenarios considered by 
the industry fT4l, always entail the high-level requirement 
for remote integrity check, or validation, of a connecting 
device. To make validation expressive, efficient, and secure, 
is a primary necessity. The specifications of the TCG Infras- 
tructure working group contain an approach to this problem, 
hierarchically distinguishing between verified components and 
sub-components |6|. On the academic side, Lo Presti JTS) 
proposed a Tree of Trust (ToT) concept and notation to 
represent a platform's structure. A ToT's nodes represent 
platform components, from TPM up to applications, annotated 
with trust and security statements. It can be used to assess the 
trust that should be put into the platform, or even to reorganise 
the platform according to certain constraints. Another technical 
domain where the shortcomings of a merely linear chain of 
trust becomes imminent is virtualisation. Virtual machines are 
created and destroyed dynamically on potentially many layers, 
resulting in "a tree-like, dynamic structure of trust dependen- 
cies" 1 16 p. 6]. While the community has acknowledged that 



structured validation data is required to truly assess platforms' 
trustworthiness, a granular association of such tree-formed 
data hierarchies to verification data (PCR values) is lacking. 

B. Basic Idea and Security Issue 

Here we propose to organise verification data and SML into 
a binary tree structure. In such a structure, verification data 
registers are the roots, the SML data structure contains the 
inner nodes and the leaves, which in turn are the component 
measurement values. The whole structure is a representative 
of the class of Merkle hash trees ||8), ||9|. The method can be 
generalised to n-ary and arbitrary trees. Figure [T] shows the 
general concept. 




Fig. L General structure of tree-formed SML and according verification 
data. Tlie star represents the root of tlie tree stored in a verification data 
register. Components (code and/or data) are indicated by packets at tlie leaves. 
Measurements hashes of the components are indicated by slip knots. Inner 
nodes (coloured balls) transport verification information upstream to the root. 
The golden lines hint at the traversal of the tree for validation, explained in 
more detail in Section ITvl 

Secure creation of verification data which represents root 
nodes of hash trees poses a particular problem. In the normal 
extend operation, only the measurement value taken by the 
Root of Trust for Measurement (RoTM) on a component, and 
the current verification data register value Vi are used, and the 
operation itself is carried out in the hardware protected TPM. 



Thus, in particular, previous measurements stored without 
protection in the SML, are not used in the generation process. 
This is not possible for a hash tree, where adding a new 
leaf always affects d — 2 inner nodes of the tree, where 
d is the tree's depth. The challenge is to generate tree- 
formed verification data exclusively inside a limited number 
of hardware protected registers (PCRs), using only a single 
leaf measurement value as input, and employing only the 
normal TPM extend operation and other TPM capabilities. 
This problem is solved in Section [III] 

While we are leaning on TCG-nomenclature and some con- 
cepts, it will be clear from the minimal requirements required 
on a system creating and protecting tree-formed verification 
data, that the concepts developed in the following sections 
are not restricted to platforms and secure hardware elements 
adhering to TCG standards. 

C. Related Work 

Verification of programs before loading and while booting 
was first mentioned in |17, Sections 6.2 and 6.3], where 
a formalisation of the process is given and the concept of 
attestation appears. Code authentication is among the primary 
goals of Trusted Computing p8)-pO|. Early work on protect- 
ing executed code by securing start up of a platform, such 
as Dyad | [2T) , proposes hardware mechanisms to bootstrap 
trust in the host with secure coprocessors on standard PC 
hardware, and shows the first important applications of trusted 
platforms. Secure hardware must be involved in the secure 
bootstrap process. For instance, a secure coprocessor may halt 
the boot process if it detects an anomaly. This assumes that the 
bootstrap ROM is secure. To ensure this, the system's address 
space could be configured such that the boot vector and the 
boot code are provided by a secure coprocessor directly or 
the boot ROM itself could be a piece of secure hardware. 
Regardless, a secure coprocessor verifies the system software 
(OS kernel, system related user-level software) by checking 
the software's signature against known values pT[ . 

Tamper resistance of code has been considered by many 
researchers. A prototypical approach to the problem is rooting 
trust for program execution in hardware, such as the XOM 
(eXecute Only Memory |22|) processor architecture, and the 
XOM Operating System |23| building on it. This does not 
solve the problems of secure loading a program, and attesting 
to external entities. AEGIS |24| shows secure boot on a 
PC. AEGIS uses a signed hash to identify each layer in the 
boot process, as does Terra |J25), which can attest loaded 
components with a complete chain of certificates ending in 
attestation of virtual machines. 

Existing TCG specifications define a bi-lateral remote attes- 
tation to verify the integrity of a platform remotely, by veri- 
fying the binary executables. All executed code is measured 
when it gets loaded. The measurements are stored in PCRs as 
verification data, and the TPM attests to these data by signing 
them with a TPM protected key. The verifier can, upon receipt 
of these metrics, decide if the platform can be considered 
trustworthy. Since the whole configuration is transmitted and 



verified, the verifier needs to know all configurations of 
all machines. Furthermore, binary attestation discloses the 
complete configuration and thus poses a privacy risk. In p6| 
and p7) , | [28| "property," respectively, "property-based attesta- 
tion" (PBA) are proposed. PBA allows to assure the verifier of 
security properties of the verified platform without revealing 
detailed configuration data. A trusted third party (TTP) is used 
to issue a certificate which maps the platforms configuration 
to the properties (in particular desired/undesired functionality) 
which can be fulfilled in this configuration. The TPM can 
then, using a zero-knowledge proof, attest these properties 
to the verifier without disclosing the complete configuration. 
Essentially, PBA moves the infrastructural problem of platform 
validation to a TTP, similarly to, but extending the role of, the 
TCG's privacy CA. 

Another alternative is presented by the Nexus OS p3\ 
which builds on a minimal Trusted Computing Base (TCB) to 
establish strong isolation between user space and privileged 
programs. Nexus has secure memory regions and monitoring 
and enforcement machines to protect them. One application 
is to move device drivers into user space [30J . Attestation by 
Nexus attaches descriptive labels to monitored programs and 
thus allows for expressiveness similar to PBA, but system- 
immanent. Both the PBA concept, as well as the Nexus 
approach do not have means to validate a complex system 
comprised of a multitude of components, which furthermore 
shall be dynamically managed. Both approaches are orthogo- 
nal to the present one, and could be combined with it. 

Hierarchical Integrity Management (HIM), see pTj , 
presents a dynamical framework for component-wise integrity 
measurement and policy-enabled management of platform 
components. Components and sub-components are related in 
HIM via dependency graphs, the most general structure that 
is useful for this purpose |32|, p3) . But HIM is not aimed at 
(remote) platform validation and does not protect structured 
platform verification data in a PCR. Rather, it holds mea- 
surements are together in a global Component Configuration 
Register (software registers) table. 

The main intended application of the hash trees introduced 
by Merkle for integrity protection of large datasets is in certifi- 
cate management in a PKI. This yields long-term accountabil- 
ity of CAs, using Merkle trees |34|, or authenticated search 
trees 1351. Various groups have extended the use of hash trees 
to general long-term secure archiving for digital data p6) , 
|37|. Corresponding data structures have been standardised in 
the so-called Evidence Record Syntax, by the IETF |38j. 

A lot of research work has gone into the usage of hash trees 
for run-time memory protection. See Elbaz et al. [39] and Hu 
et al. |40| for a recent topical overviews over the state-of- 
the-art. Typical systems employing hash trees for storage and 



memory protection |4TJ-| 43 1 separate a system into untrusted 



storage and a TCB. A program running on the TCB uses hash 
trees to maintain the integrity of data stored on an untrusted 
storage, which can be, e.g., some easily accessible, bulk store 
in which the program regularly stores and loads data which 
does not fit into the TCB. Gassend, et al. |42) also propose to 



store the root of the entire tree in an on-chip trusted register 
of constant size, but keep all other nodes are in main memory 
or cache. 

The work most closely realated to the present one is 
constituted by the proposal of Sarmenta, van Dijk, et al. |44|, 
to protect arbitrary memory objects via hash trees which in 
turn are protected by a root in TPM non volatile memory. 
In r44l a new TPM command TPM_ExecuteHashTree is 
introduced which allows to add, delete, and update so called 
TPM_COUNTER_BLOB objects, and which issues a certificate, 
signed by an AIK, that attests to the successful verification 
of that object's data with respect to the hash tree's root. 
While this is a fully general method for handling arbitrary 
data sets in a TPM-protected hash tree, it does not address the 
special problem of building the tree from sequentially arriving 
measurement values maintaining the same security properties 
as the normal TPM_Extend command, cf. Section [II-B| 

A different usage of hash trees is proposed in [45 1, where 
it is shown how they can support authentication of distributed 
code in Wireless Sensor Networks (WSN). Also in WSN, 
data aggregation involving multiple nodes may be integrity 
protected using hash trees | |46) . Different from hash trees, 
another potential approach to make verification data searchable 
are Authenticated Append-only Skip Lists 1471, which are 
sorted linked lists designed to allow fast lookup of the stored 
data elements by taking "shortcuts." However, trees are better 
suited for validation of a platform's state, in particular to 
efficiently determine the subset of components at the leaves 
failing validation. 

Relative to the cited state-of-the-art, our present contri- 
butions are twofold. First, we introduce a new method to 
generate a binary Merkle tree from component measurement 
values using only a limited set of tamper-resistant verification 
data registers, and existing capabilities of the TPM, i.e., the 
standard extend operation. The algorithm is small enough to 
be executed within a TCB, in particular on-chip. This part of 
our proposed method increases security of the generation of 
the root of a hash tree, which in turn provides more security to 
the tree nodes. This problem is, to the best of our knowledge, 
not considered in the literature. Second, we show how to 
exploit the tree structure for efficient validation with enhanced 
diagnostic capabilities over common PCR values and SMLs, 
to increase security features of remote platform validation, 
and concurrently benefiting from the efficiency of tree-like 
structures in the search for failure points. This use of tree- 
structured data for secure diagnostics, vaUdation, or attestation 
(all fields to which the proposed concepts apply), has also not 
been considered elsewhere, to the best of our knowledge. 

III. Secure Generation of Tree-Formed 
Verification Data 

In this section, we show a practical solution for the problem 
described, using only a limited number of verification data 
registers to securely generate one root verification value. 

It should be noted that every reference to the concrete 
embodiments of Trusted Computing specified by the TCG 



made in this paper, in particular TPM operations, PCRs, and 
SML, are examples for possible realisations of the presented 
concepts. The algorithms and procedures can in principle 
be applied to every security technology with the minimum 
capabilities which are used by them. 

A. Tree Formation Procedure 

In our proposed solution, one of the hardware protected 
registers V = {Vi, . . . , Vr}, e.g., PCRs, contains the root of 
the final tree. The tree is chosen to be binary, to keep the 
algorithm as compact as possible and to provide a fine grained 
detection of failed components. The leaves are carrying the 
measurement values, while the inner nodes are stored in a 
modified SML. The SML is modified in a way to support 
the tree structure of the validation data, i.e. it is no longer a 
linear list of measurement values but the data structure must 
support standard tree operations and traversals. For efficient 
search during platform validation, the SML must support the 
addition of new leaves and retain edge relations. Adding a 
new measurement at a leaf to the tree at depth d requires 
recalculation of all d— 1 inner nodes of the leaf's reduced hash 
tree and the tree root which is stored in a V" G V. A Merkle 
tree has a natural colouring of edges as "left", respectively, 
"right" ones, since the binary extend operation ([TJ, is non- 
commutative. Leaves inherit this order and are added from 
left to right. The binary, d-digit representation of leaf n, < 
n < 2^* — 1, denoted by (n), yields natural coordinates for the 
inner nodes and edges on the unique path from leaf to root. 
That is, the fc-th digit (counted from the MSB, k = 1, . . . , d), 
{n)k, determines whether the node at depth fc— 1 on this path 
is connected by a left, respectively, a right edge, by {n)k = 0, 
or, {n)k — 1, respectively. 

We make the following assumptions: (1) the root of every 
subtree created during the execution of the algorithm must 
always be stored securely m ?l V € V. (2) If two subtrees 
(measurement values are subtrees of depth 0) with the same 
depth d! exist, they can be merged to a single tree of depth 
d' + 1. (3) The merge operation must preserve assumption (1), 
i.e., one of the two V protecting the roots of the subtrees is 
freed after the merge operation. Using these assumptions, the 
update algorithm for a newly arriving measurement value can 
be formulated such that registers X^i, . . . , Vd^i always contain 
the current state of "active" subtrees of depth 1, . . . , d— 1, and 
thus Vd always contains the current global root value. "Active" 
here means a subtree the root of which awaits completion by 
merging with a subtree of the same depth. Care is taken in 
the formulation so that only the actual measurement value, 
protected registers, and the normal extend operation are used, 
and no unprotected memory places are involved. Denote an 
empty node in the full binary tree of depth d by nil. The tree 
formation is performed by Algorithm [T] 

If n < 2**, the tree is incomplete at the right edge, and 
the cleanup procedure shown in Algorithm |2] is then needed. 
Algorithm l2] results in a final merge of roots such that Vi 
ultimately contains all subtree information. Note that this 
cleanup procedure is only reached if the tree is not already 



Algorithm 1 Tree formation algorithm 

Require: Vi, . . . ,Vd eV, m e {0, l}''^^^ 

Ensure: Vi, . . . ,Vd = nil > Initialise subtree roots empty. 

while (to -i— RoTM) 7^ nil do > Get new measurement. 
TO, — > SML [> If non-empty, add as new leaf 

if (n)d = 1 then > A value arriving from right 

> extends the root at depth d — 1, 
> which is purged to the SML. 

> Update subtrees of depth 2, . . ., 
1) A (fc > 0) do > while coming 

-1 > from right. 



9: 
10; 
11 
12: 
13 
14: 
15 
16: 
17 
18 
19 
20: 
21 



Vd ^VdOm 

Vd -^ SML 
/c^d- 1 
while {{n)k = 

Vk^VkO Vfc^ 

Vk -^ SML 

fc ^ fc - 1 
end while 
if fc = then 

return "ti-ee full" 
end if 
Vk ^ Vk+i 



else 

Vd< 
end if 
n ^ n 
end while 



> If it is arriving from the left, 
> it is put into the root at depth d — 1, 



for fc 
if 



Algorithm 2 Cleanup of an incomplete tree 

22 
23 
24: 
25 
26: 
27 
28 
29 
30: 



fc 1 to 1 do 
n)k ~ 1 then 

Vfe ^ Vfe o Vfe+i 
Vk -^ SML 
else 

Vfe ^ SML 
end if 
end for 



full, due to the test in lines 13 15 of algorithm [T] The rule by 
which the tree is completed is that the configuration 

X 

/\ 
X nil 

is correct at the right edge. All inner nodes are written to 
the SML, even if they are the result of forwarding along a 
left edge (entailing minor redundancy). Formally, the above 
rule corresponds may be interpreted as modifying the notion 
of the 'o' operation such that x o nil — x, as explained in 
Appendix [A] 

It is interesting to note that, if leafs and inner nodes are 
appended to the SML in the order prescribed by algorithm [T] 
a natural serialisation of the resulting tree is obtained. This 
order is shown in Figure |2] for an incomplete tree of depth 
3. The marked entries 10 and 11 in the resulting SML are 
identical, since 11 is created by a forward operation of the 
cleanup algorithm l2] The SML order can be used to address 




Fig. 2. Order of nodes in a tree-formed SML. 



tree nodes in the SML by a binary search. Given a sequence 
number K in the SML of length 2'*+^ — 1, such a search 
proceeds from the root, which is the last entry. The remaining 
2^+1 _ 2 entries are equally partitioned into portions of size 
2'' — 1, and it is decided if K is in the left or right part. This 
procedure is iterated until K points to the rightmost element 
in the current part. The sequence of decisions made yields the 
sequence of left-right edges leading from the root to the node 
with index K in the SML. 

The tree-formation algorithm can easily be adapted to 
trees of arbitrary, uniform, arity, say b. For this, the binary 
coordinate {n) has to be replaced by the 6-ary coordinate 
(n) "^^^ and its d-th, respectively, fc-th digit evaluated in line U\ 
respectively, |8] of algorithm [T] where the evaluated expression 
has to be changed to (ri)^ — h~ 1, respectively, {n)k = b—1. 
Algorithm |2] has to be adapted accordingly. A further gener- 
alisation to arbitrary trees requires only establishment of the 
associated node coordinates, i.e., of the mapping n — > node. 
Note that at every node with arity higher than 2, since 
hash extension is linear for the legs connecting to it, the 
disadvantages mentioned in Section |II-A| apply, and loss of 
detection granularity occurs. 



B. Maximum Tree Capacity 

It is clear from the generation procedure that, with a limited 
number, Vi, . . . ,Vr, of verification data registers, only a finite 
number of components at the leaves of trees can be covered. In 
contrast, the hash chain created by the standard, linear extend, 
ending in a single PCR value, is in principle of unlimited 
length. The maximum capacity of trees generated with r root 
registers can be calculated as follows. The procedure for the 
first register, Vi, can use the r— 1 other registers as a pipeline 
of length r — 1 to build a tree of depth r. When Vi is occupied, 
the second register can support a tree of depth r — 1, and so 
on, until the last register, Vr, for which the pipeline has length 
and the tree depth 1. Thus the total number of leaves carried 
by the trees of all registers is 



^trees 



Y^ofc 



,^trees / ^ 

k=l 



T = T^' - 2 



(2) 



For r = 24, the number of PCRs of a TPM adherent to the 
V 1.2 specification, this yields 33,554,430 places for compo- 
nent measurements at the leaves of the r trees. If restricted 
to the last 16 PCRs, since, for instance, according to the PC 
Client specification of the TCG |2| PCRs 0-7 are reserved, 
Q still counts 131,070 measurements (see Section [V] for 
a discussion of implementation issues with standard TPMs). 



Since the number of measurements to be taken during start 
up or at run-time is not a priori known, the last register can, 
as a fallback, be linearly extended after the capacity limit is 
reached. Figure [3] shows this arrangement. 



/ 



Vi 



Vr- 



Y. 



> T, 



I \ I \ 

m m m m 



Fig. 3. Maximum capacity an'angement of tree verification data. Measure- 
ment values at the leaves are indicated as m. 



C. Complexity of Tree Formation 

The spatial complexity of the tree formation algorithm is 
very small. As internal data needs precisely three locations: 
d e {1, . . . , r}, n e {0, . . . , 2'' - 1}, and fc e {1, . . . , d}, the 
size of that data is at most d + 2 [log2 d\ < r + 2 [log2 r] 
Bits. Additionally, depending on implementation one register 
may be required to receive and hold the current measurement 
value, and/or as intermediate register for the operations on 
verification data registers. The SML increases moderately in 
size. For a completely filled binary tree of depth rf, 2''+^ — 2 
node values, including leaf measurements, are stored in the 
SML (the root node is contained in a Vt). That is, the tree- 
formed SML is less than double the size of the linearly formed 
SML containing only measurement values. 

For an estimation of the temporal complexity, we consider 
a full tree of depth d, i.e., 2'^ leaf measurements. The various 
operations involved in algorithm [T] are 
M Add measurement to Vd\ Vd <— m. 
Sv Store a verification data register to SML; T4 ~^ SML. 
Sm Store measurement to SML; m — >■ SML. 
V Copy verification data register; T4 •«— T4+i- 
El Extend Vd with measurement; Vd-^ VdO m. 
E2 Extend inner node registers; Vk ^ Vk o Vk+i- 
The symbols above denote the operations and their execution 
times interchangeably. The one missing operation m ■(— RoTM 
can be subsumed in Sm- 

By the structure of the tree, the occurrences of the opera- 
tions are easily counted. Sm occurs at each leaf, i.e, 2'' times. 
El and M occur at each inner node at depth d— 1, i.e., 2'*^^ 
times. V and E2 occur at each inner node from depth d — 2 
upward, i.e., 2''^^ — 1 times. Finally, Sv occurs at each inner 
node of the tree except the root, which remains in Vi. That is, 
Sy occurs 2^* — 2 times. Altogether this yields the estimate 

2''-\Ei + M) + (2'*-i - 1)(^ + E2) + 2'^Sm + {2'^ - 2)Sv 

for the algorithm's execution time, disregarding flow control. 
Grouping similar operations {_Ei,£'2}, {^/, S*;/, 5,„} yields 



2''~\Ei+E2 



2'^-\M + 2Sv- 
-2Sv 



-(2 



Assuming that all memory operations are approximately 
equally time-consuming and bounded by a common constant 



is 



M^Sv~kSm~\V< S, 



(where a factor 2 is included in V for a naive read/store im- 
plementation, and in Sm for the missing operation mentioned 
above), and likewise for the extend operations 



E, 



E2 < E, 



a coarse estimate for the temporal complexity of tree formation 
for d > 1 is 

<2'^{E + A\S)-{E^AS). 

When extend operations are the dominating factor, it is inter- 
esting to note that tree formation actually needs one extend 
operation less than the linear chain of authenticated boot. 

IV. Validation of Tree-Formed Verification Data 

For the validation of tree-formed verification data, generated 
by the procedure of the last section, we now present the 
validation strategy which exploits all available information at 



every tree node. In Section IV-B the average computational 



d-l 



1)V. 



cost is calculated in relation to the number, respectively, 
relative share of failed measurements. 

Taking a linear chain of measurements generated and stored 
in an ordinary authenticated boot and sequentially extended 
to a PCR as the reference case, we see that tree traversal 
validation is significantly different. In the former case, a ma- 
nipulation of the SML cannot be localised in principle, while 
traversing a tree-formed SML allows to identify a subtree 
where a manipulation has occurred. Similar considerations 
hold for diagnostic validation, i.e., the search for components 
which do not conform to a desired reference configuration of 
the validated platform (called here failed components). For the 
linear chained SML this requires comparing each measurement 
with a reference value and recalculating the complete chain of 
extend operations up to the PCR to verify the SML's integrity. 
Since manipulations in the linear SML cannot be localised, a 
failure to reproduce the PCR value also means that diagnostic 
validation becomes impossible, and failed components cannot 
be distinguished from good ones. 

For tree-formed SML, the situation is much better If 
a subtree is identified, where manipulation of the SML is 
suspected, the complement of it in the SML tree can still 
be validated. Also, for diagnostic validation, one may expect 
a significant speed-up in determining the set of failed com- 
ponents, and concurrently verifying the root verification data 
register contents. 

A. Tree Traversal Validation 

The aim of validation of a tree-formed SML is to find the 
subset of leaves failing validation, and to detect manipulations 
of the SML, where possible. We assume there is a reference 
tree for comparison locally available at the validator. Then, 
validation can start from the root of the tree, i.e., a verification 
data element V, traversing the tree of SML data. This yields 



the leaf set of components for which measurements differ from 
reference values, called failed components. In traversing the 
tree, a depth-first search with pruning is applied, and decisions 
are taken at every branching node. Again we assume that the 
trees are binary. Then, the SML tree values at a branching 
node and its two children are compared with the reference 
tree values of the same node positions, and the results are 
noted as g (good) for agreement and b (bad) for discrepancy. 
In this notation, the following situations can occur, as shown 
in Figure |4] 

In case (a), the whole subtree below this parent node is 
validated positively, and traversal ends at this node. In the 
cases (b), the parent node is recalculated by the validator 
applying the extend operation to the child node values. If 
the recalculated value does not match the value at the parent 
node, this indicates a SML manipulation in one of the subtrees 
with a root marked as bad. This is handled as an exception. 
Otherwise, validation can proceed to the next tree level, 
traversing the subtrees where bad values are found, i.e., left, 
right, or both subtrees in (b), respectively. In cases (c), a 
tree manipulation exception is detected. It should be noted 
that this detection takes place without recalculating an extend 
operation. The last situation, (d), only occurs when the binary 
tree is incomplete, and a right branch is null. Then value x 
must equal value y, in which case traversal proceeds to the 
left, and otherwise a tree manipulation exception occurs. 
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Fig. 4. Classification of node configurations in a tree-formed SML. 

B. Cost for Tree Validation 

A principal advantage of validating tree-formed SMLs is 
that subtrees with a correct root can be discarded from further 
search for failed components. In this section we lay out a 
simple, probabilistic model to quantitatively assess the perfor- 
mance of tree validation. Assume for simplicity that the SML 
is a full tree of depth d. The validator has a complete reference 
tree representing a known, desired platform configuration. We 
assume that recalculating hash operations is the dominant cost 
factor to estimate validation complexity, while comparisons are 
cheap. Assume a random set of failed leaves. 

We use an optimistic validation strategy, called diagnostic 
validation, which traverses the paths from the root to failed 
components, i.e., components with bad measurement values 



with respect to the leaves of the reference tree. The unique 
property of this strategy is that it finds all failed components 
with authentic measurement values. Diagnostic validation pro- 
ceeds as follows. When visiting an inner parent node which 
differs from the corresponding node in the reference tree, i.e., 
a bad parent node, one of the situations in Fig. |4] (b), or the 
rightmost configuration of (c) is encountered. In the latter case, 
no recalculation of the parent node needs to be performed since 
it is an obvious SML integrity failure. The subtree with this 
root configuration is discarded from further traversal, since 
it cannot be assumed to yield trustworthy information about 
failed components. In this case, further steps depend on the 
validator's policy. The node configurations (b) are precisely 
the ones which require re-calculation of the parent hash from 
the root hash by one extend operation o, to confirm that the 
configuration, which is unknown from the validator's reference 
tree, is authentic. The subtrees whose roots are good children 
of the bad parent node under scrutiny, are discarded from 
further traversal. Note that this procedure of diagnostic val- 
idation implicitly excludes the configuration (a) and the three 
left configurations of Fig. |4] (c) from diagnostic validation. 
They may be considered in further forensic evaluation of the 
SML tree, wherever this makes sense. 

Summarising, we see that diagnostic validation requires to 
visit and perform a hash operation at all bad inner nodes in 
the union of all paths from failed (bad) leaves to the root. 
In an otherwise untampered tree, this implicitly excludes the 
right configuration (c) with bad parent node. Assume that a 
subset of i.i.d. bad leafs constitute a fraction / g [0, 1] of all 
leafs. The number of bad leafs is 2*^/. The expected number 
i?'™"(/) of bad inner nodes can be calculated as explained in 
Appendix IE] 




Fig. 5. Expected fraction of bad inner nodes on random distribution of 2"*/ 
bad leaves for d = 16. 

Figurejsjshows the fraction of the 2'*— 1 inner nodes, for d = 
16, at which a hash operation will occur under the assumptions 
above. This represents the number of hash operations which 
are necessary to determine the bad components with certainty. 
The reference case of a linear SML requires 2^^ + 1 hash 
operations to recalculate the final PCR value. This case is 
roughly represented by the upper ordinate axis of the figure. 
With regard to comparisons to reference values, the situation 
is slightly different. Tree traversal for diagnostic validation 



descends along the bad inner nodes which fail comparison 
with the reference tree's corresponding inner node. For that, 
both children of a bad inner node have to be compared in 
every case, so that the complexity in terms of comparisons is 
twice the number i?'™'^'(/). The linear SML requires all 2'' 
measurements to be compared with reference values. 

If h is the cost of a hash operation at the validator, and 
c the cost of a comparison of two hash values (160 Bit for 
SHA-1), then the total validation cost of the linear case is 
(2'^ + l)h + 2'^c = 2'^{h + c) + h > 2'^{h + c). This is the 
least effort to obtain the same information from a linear SML 
as by diagnostic validation of a tree-formed SML. For the 
tree-formed SML on the other hand (including the root in 
the count), the cost is (£;'"""='(/) + l)(2c + h). Tree-formed 
validation is more efficient if 



2d 



^mnercj^ + 1 h 



A+1 



h + 2c 2A + 1^ 



where A = c/h ^ 1. With a very generous margin, A < 0.01, 
which yields a bound of 0.99 for the rh.s. Then, for d = 
16, tree-formed validation is expected to be more efficient for 
fractions / of bad leaves as high as 85%. 

We see that diagnostic validation of a tree-formed SML 
always performs better in terms of hash operations than with 
a linear SML, and outmatches the linear SML completely 
even for large fractions of bad components, under reason- 
able assumptions, and becomes vastly advantageous for small 
fractions of failed components. It can be expected that tree 
validation is yet more efficient when the bad leaves are non- 
uniformly distributed, e.g., exhibit clustering. While we have 
directly compared linear and diagnostic tree validation, it 
should be noted that linear validation becomes impossible if 
the recalculation of the final PCR fails, since then, comparison 
of single measurements does not yield rehable information — 
each measurement could be faked in the SML to hide the 
one which broke the hash chain. In conclusion, the principal, 
semantic advantage of tree-formed validation data comes about 
even at decreased computational complexity for the validator 

V. Implementation Options 

With regard to the tree-formation algorithm itself, to achieve 
the same level of security as TCG standard compliant trusted 
boot processes, all operations on verification data registers 
should run inside the hardware-protected TPM environment. 
Though part of the operations in Most operations of the tree- 
formation algorithm listed in Section III-B are, however, non 



standard TPM functions that can be executed on standard- 
conforming PCRs. In fact, only the normal extend operation 
El is an internal standard function, and Sv and Sm can be 
realised by PCR read operations. 

We first discuss the minimal modifications that would be 
necessary to extend a TPM to turn PCRs into tree-formed 
verification data registers, while the tree-formation algorithm 
may still run outside the TPM. Then, we propose a new TPM- 



internal command for tree formatiorQ 

A. Minimal TPM Modifications for Tree-Formation 

Let us first take a minimalist approach to implementing tree- 
formation and carve out the least changes to a standard TPM 
that would enable PCRs for use with the algorithms [T] and l2] 
This regards implementing the elementary operations listed 
in section |III-C| by TPM commands or modifications thereof. 
The core of the algorithm, including the bookkeeping tasks on 
registers representing inner nodes' current states, could then 
be realised as a software root of trust for performing tree- 
formation in a system integrity measurement process such as 
authenticated or secure boot. 

The operations Sv and Sm pose no problem and can be 
realised by TPM_PCRRead commands or directly in the tree 
formation software, respectively. Ei occurs at every right edge 
at the lowest level of the tree, and extends a V containing 
already a measurement value which came from the left sibling 
of the measurement which is extended into V . Therefore, Ei 
is precisely the standard TPM_Extend operation defined by 
([T}. E2 also occurs at right edges inside the tree and, in turn, 
is straightforwardly modelled by TPM_PCRRead followed by 
a TPM_Extend. 

Operations M and V occur at left edges on the lowest level 
of, respectively, inside the tree. They pose a particular problem 
for two reasons. First PCRs cannot be directly written to, and 
a natural approach to reset them via TPM_PCR_Reset as a 
first step in M or V is problematic, since only PCRs above 
16 of a standard TPM can be reset, and only from the correct 
locahty. Thus it is necessary that enough PCRs are resettable 
and that they respond to the locality in which the tree- 
formation software is executed as a trusted code. Secondly, 
even after reset, the only operation which can modify a PCR, 
TPM_Extend, does not directly copy a value into the register 
but truly executes ([TJ with the existing value of the reset PCR, 
which is 160bit binary 0x00 and the input value, which yields 
a result different from the input value. One option, which 
avoids exposing new commands directly writing to, or shifting 
values between PCRs, would be to augment PCRs with a 
reset flag which indicates that they are in a pristine state 
after reset. Then, TPM_Extend can be modified such that it 
directly writes into the PCR when this flag is true, and then 
sets it to false. 

Realising that M and V consistently occur at left edges 
of a tree, and only if the right sibling is empty {nil), and 
then deterministically produce an outcome depending only on 
the two siblings involved, a third option would be to deviate 
slightly form the definition of a Merkle hash tree. The correct 
configuration of values in every elementary triangle in the 
SML tree would then be as follows. 
(Oocc) oy 
/ \ 

X y 

'a third variant, wtiich is not further discussed here is a software-based 
implementation of tree-formed verification data, where the root registers are 
soft registers managed by a trusted application, and where the current state 
of such registers is protected by a 'real' register, e.g., a PCR. 



That is V or M is modelled by TPM_PCR_Reset followed 
by TPM_Extend to obtain o a; = H{0\\x) in the first step. 
The right sibling is then normally extended in that register 
and the result written to the SML. See Appendix |A] for a 
consistent treatment of nil node values in intermediate stages 
and finalisation of a tree. 

B. TPM_Tree_Extend 

The split TPM/software implementation of tree formation 
compromises on the security level of the resulting root verifi- 
cation data register values. It is preferable that tree-formed ver- 
ification data is produced by a TPM-intemal implementation 
of the proposed algorithms. For this, a TPM modification can 
work as follows. The modified TPM exposes a new command 
TPM_Tree_Extend with the same input parameters as the 
usual TPM_Extend command. The TPM maintains flags for 
PCRs signifying which of them are currently designated tree 
roots, which are occupied and locked, and which are usable 
as intermediate Vs by the algorithm. Furthermore, the TPM 
maintains the additional data mentioned in Section lTlI-CI In the 
simplest case, internal logic prevents concurrent use of more 
than one PCR for tree formation. While TPM_Extend outputs 
only the update of the target PCR value, TPM_Tree_Extend 
returns a variable number 1, . . . , rf of updated verification reg- 
ister data values in sequence such that they produce the natural 
order described in Section IIII-AI These return values are the 
output of the SML write operations of algorithms [T] and [2] 
When d values are returned, the receiver knows that this tree 
is exhausted and the corresponding root V locked. Another 
option not considered here is for TPM_Tree_Extend to 
return all intermediate Vs on each call. 

VI. Conclusion 

Though hash trees are widely used, ours is the first proposal, 
to the best of our knowledge, to use Merkle hash trees to 
protect the integrity of the secure start up process of a trusted 
platform in the same way as is traditionally done with PCRs. 
We have demonstrated the efficiency and flexibility gains 
resulting from using tree-formed verification data in platform 
validation. This may be effective in particular in the remote 
validation and management of platforms via a network. Given 
the small size and complexity of the tree-formation algorithm, 
it seems possible to implement all these operations directly 
inside the TPM, if specifications are amended accordingly. 
This may be a feasible approach for future TPM generations. 

With regard to generalisations, trees are certainly not the 
most general structures for which integrity protection using 
cryptographic digests can be applied. For instance, some 
researchers have extended hashes to provide identification 
of directed graphs f48|. Others have applied variant one- 
way functions, e.g., multi-set hashes |49| to uniquely identify 



complex data structures such as RDF graphs |50|. Along these 
lines, generalisation of tree-formed verification data to, for 
instance, directed acyclic graphs, and dependence graphs |32|, 
p3) can be conceived. While potentially interesting for com- 
plex platform management and protection tasks, every such 



generalisation would incur increased complexity and loose the 
efficiency of binary trees for validation. Application cases for 
such generalisations are therefore deferred to further study. 

The single command extension of the TPM integrity mea- 
surement functionality, TPM_Tree_Extend proposed above 
is, however, only the starting point of a flexible, TPM-based 
tree verification data management architecture. In particular 
it would be desirable to enable secure updates of subtree 
roots, for instance for dynamic platform management, and 
ultimately to quote an inner node of a tree-formed SML with 
the same security assertions as TPM_Quote provides to a 
remote validator for a PCR value. This shall be discussed 
elsewhere. 

Appendix A 
A Useful Convention 

In many cases, the hash tree stored in the SML will 
be incomplete, i.e., contain empty leaves and inner nodes. 
In the continuous measurement process, such nodes, with 
value denoted nil, are treated procedurally by the operations 



M and V (see Section III-Cl which means that right nil 



siblings are ignored. This happens in lines 18 and 16 of 
Algorithm [T] for intermediate stages of tree formation, and 
in line |27] of Algorithm |2] at completion of the tree after the 
last measurement. 

Generally, i.e., transgressing the restrictions of a standard 
TPM, it may be useful to assume that nil is a two-sided unit 
for the operation o, i.e., 

X o nil = nil o x ^ x, and nil o nil — nil. (3) 

This convention manifests rule (d) of Section |IV-A| It is a re- 
interpretation of the usual extend operation and can also be 
used to eliminate the operations M and V in the algorithms' 
formulations. Namely, M and V can be replaced by a reset 
of a register V to nil followed by the operation V ^ V om, 
respectively V ^^ V oV'. 

For the implementation of this convention, we may assume 
that nil is to be represented as an additional flag of PCR 
registers, and the inputs and output of o. For a PCR, the 
nil flag is set by a particular reset command. When nil is 
encountered as the input of an extend operation to a PCR, 
then logic of the TSS, or a TPM modification, may prevent 
execution of the hash operation (fTl) and write to the PCR 
directly. 

Appendix B 
The Expected Number of Bad Inner Nodes 

The problem under consideration is that of bi-colouring (bad 
vs. good inner nodes) of a binary tree generated by a random, 
i.i.d. choice of leaves and colouring of the path connecting 
it to the root. Random choices of such leaves and paths is 
equivalent to random choices of i.i.d. bit strings of length d. 
We first calculate the expected number E^ of coloured leaves 
after k choices from the set of iV = 2** leaves. Recursively, 
E'^ = 0, and 
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Solving this obtains 



(l-(l-^-^)')- 



^f =iVfl-(l-7V-i^'' 



Since all substrings of the chosen bit-strings are statistically 
independent the same argument applies to inner nodes at all 
levels d — 1, . . . , 0. Thus, the expected number of coloured 
inner nodes is obtained by summation 

d-l 



TTiinner 



-E< 



£=0 



It remains to find the expected number of choices k which 
corresponds to a certain expected number Ej^ = fN of 
coloured leaves, where < / < 1 is a target fraction of 
leaves. Solving this equation for k yields 

._ Ml-/) 

ln(l-2-'i)' 

where N ~ 2'^ was inserted. From this, the expected number 
of bad inner nodes in dependency of /, E"^""{f), can be 
calculated. 
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